Daten einer Website per HTTP ermitteln
In diesem kleinen Demoprogramm zeigen wir, wie man sich die Daten einer Website beschafft. Aus den Daten werden alle Links extrahiert und angezeigt.
Mithilfe der Klasse CL_HTTP_CLIENT besorgen wir uns den Quelltext einer Internetseite. Die URL muss dabei komplett angegeben werden, also inklusive http://
Coding
*:: Selection screen PARAMETERS p_url TYPE string LOWER CASE DEFAULT 'https://tricktresor.de/wp-content/index.php?aID=0'. START-OF-SELECTION. PERFORM get_urls USING p_url. *&---------------------------------------------------------------------* *& Form GET_URLS *&---------------------------------------------------------------------* FORM get_urls USING iv_url TYPE clike. *:: local data DATA lv_http_url TYPE string. DATA lr_http_client TYPE REF TO if_http_client. DATA lv_html_code TYPE string. DATA lt_urls TYPE STANDARD TABLE OF string WITH NON-UNIQUE DEFAULT KEY. DATA lt_new LIKE lt_urls. DATA lv_regex TYPE string. DATA lv_url TYPE string. DATA lv_dummy1 TYPE string. DATA lv_dummy2 TYPE string. STATICS lt_list TYPE HASHED TABLE OF string WITH UNIQUE KEY table_line. *:: create url CALL METHOD cl_http_client=>create_by_url EXPORTING url = iv_url IMPORTING client = lr_http_client EXCEPTIONS argument_not_found = 1 plugin_not_active = 2 internal_error = 3 OTHERS = 4. IF sy-subrc > 0. *:: error WRITE: AT 40 'Unable to create url, Sy-Subrc;', sy-subrc. STOP. ENDIF. *:: Send out request lr_http_client->send( ). *:: Receive result as stream CALL METHOD lr_http_client->receive EXCEPTIONS http_communication_failure = 1 http_invalid_state = 2 http_processing_failed = 3 OTHERS = 4. IF sy-subrc <> 0. *:: error WRITE: AT 40 'Unable to read data, Sy-Subrc;', sy-subrc. ELSE. *:: Get sourcecode lv_html_code = lr_http_client->response->get_cdata( ). WRITE:/ iv_url COLOR 5. *:: simple method - Find urls SPLIT lv_html_code AT 'href=' INTO TABLE lt_urls. LOOP AT lt_urls INTO lv_url. FORMAT COLOR OFF. CHECK lv_url IS NOT INITIAL. CHECK lv_url(1) = `"` OR lv_url(1) = `'`. FIND lv_url(1) IN lv_url+1 MATCH OFFSET sy-fdpos. CHECK sy-subrc = 0. lv_url = lv_url+1(sy-fdpos). IF lv_url IS INITIAL. CONCATENATE iv_url lv_url INTO lv_url. ELSEIF lv_url(1) = '#'. CONCATENATE iv_url lv_url INTO lv_url. ELSEIF lv_url(1) = '/'. "Root FORMAT COLOR COL_GROUP. ELSEIF lv_url(1) = '?'. SPLIT iv_url AT '?' INTO lv_dummy1 lv_dummy2. IF sy-subrc = 0. CONCATENATE lv_dummy1 lv_url INTO lv_url. ELSE. ENDIF. ELSEIF lv_url(5) = 'https' OR lv_url(5) = 'HTTPS'. FORMAT COLOR COL_POSITIVE. ELSEIF lv_url(4) = 'http' OR lv_url(4) = 'HTTP'. FORMAT COLOR COL_NORMAL. ENDIF. *:: try to find main URL in link CONCATENATE '^' iv_url INTO lv_regex. FIND REGEX lv_regex IN lv_url. IF sy-subrc = 0. FORMAT INTENSIFIED ON. ELSE. FORMAT INTENSIFIED OFF. ENDIF. WRITE: /10 lv_url. ENDLOOP. ULINE. ENDIF. ENDFORM. " GET_URLS
Letzte Artikel von Enno Wulff (Alle anzeigen)
- 7. December: Excel Racing Simulation – Root Vole Race - 7. Dezember 2024
- 5. December: ABAPConf - 5. Dezember 2024
- 4. December: Only a lazy developer is a good developer - 4. Dezember 2024